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CORRELATION ANALYSIS OF ENZYMATIC REACTION OF 
A SINGLE PROTEIN MOLECULE 1 
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Harvard University 

New advances in nano sciences open the door for scientists to 
^ | study biological processes on a microscopic molecule-by-molecule ba- 

sis. Recent single-molecule biophysical experiments on enzyme sys- 
tems, in particular, reveal that enzyme molecules behave fundamen- 
tally differently from what classical model predicts. A stochastic net- 
work model was previously proposed to explain the experimental dis- 
covery. This paper conducts detailed theoretical and data analyses of 
the stochastic network model, focusing on the correlation structure 
of the successive reaction times of a single enzyme molecule. We in- 
vestigate the correlation of experimental fluorescence intensity and 
the correlation of enzymatic reaction times, and examine the role of 
substrate concentration in enzymatic reactions. Our study shows that 
the stochastic network model is capable of explaining the experimen- 
tal data in depth. 

1. Introduction. In a chemical reaction, the number of molecules in- 
volved can drastically vary from millions of moles — a forest devastated by 
a fire — to only a few — reactions in a living cell. While most conventional 
chemical experiments were designed for a large ensemble in which only the 
average could be observed, chemistry textbooks tend to explain what really 
happens in a reaction on a molecule-by-molecule basis. This extrapolation 
certainly requires the homogeneity assumption: each molecule behaves in the 
same way, so the average also represents individual behavior. To verify this 
assumption, the kinetic of a single molecule must be directly observed, which 
requires rather sophisticated technology not available until the 1990s. Since 
then, the development of nanotechnology has enabled scientists to track and 
manipulate molecules one by one. A new age of single-molecule experiments 
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began [Nie and Zare (1997), Xie and Trautman (1998), Xie and Lu (1999), 
Tamarat et al. (2000), Weiss (2000), Moerner (2002), Flomembom et al. 
(2005), Kou, Xie and Liu (2005), Kou (2009)]. 

Such experiments offer a greatly amplified view of single-molecular dy- 
namics over considerably long time periods from seconds to hours, a time 
scale that far exceeds what can be achieved by computer based molecular 
dynamic simulation (even with a super computer, molecular dynamic simu- 
lation cannot reach beyond milliseconds). The single-molecule experiments 
also provide detailed information on the intermediate transition steps of 
a biological process not available in traditional experiments. Not surpris- 
ingly, these experiments reveal the stochastic nature of nanoscale particles 
long masked by ensemble averages: rather than remain rigid, those particles 
undergo dramatic conformation change driven by external thermal motion. 
Future development in this area will provide us a deeper understanding 
of biological processes [such as molecular motors, Asbury, Fehr and Block 
(2003)] and accelerate new technology development [such as single-molecule 
gene sequencing, Pushkarev, Neff and Quake (2009)]. 

Among bio-molecules, enzymes play an important role: by lowering the 
energy barrier between the reactant and product, they ensure that many life 
essential processes can be effectively carried out in a living cell. An aspira- 
tion of bioengineers is to artificially design and produce new and efficient 
enzymes for specific use. Studying and understanding the mechanism of ex- 
isting enzymes, therefore, remains one of the central topics in life science. 
According to the classical literature, the kinetic of an enzyme is described 
by the Michaelis-Menten mechanism [Atkins and de Paula (2002)]: an en- 
zyme molecule E could bind with a reactant molecule S, which is referred 
to as a substrate in the chemistry literature (hence the symbol S), to form 
a complex ES. The complex can either dissociate to enzyme and substrate 
molecules or undergo a catalytic process to release the product P. The en- 
zyme then returns to the original state E to start another catalytic circle. 
This process is typically diagrammed as 



where [S] is the substrate concentration (E° is the release state of the en- 
zyme), k\ is the association rate per unit substrate concentration, and k2 
are, respectively, the dissociation and catalytic rate, and 5 is the returning 
rate. All the transitions are memoryless in the Michaelis-Menten scheme, 
so the whole process can be modeled as a continuous-time Markov chain 
consisting of three states E, ES and E° for an enzyme molecule. 

A recent single-molecule experiment [English et al. (2006)] conducted by 
the Xie group at Harvard University (Department of Chemistry and Chemi- 
cal Biology) studied the enzyme /3-galactosidase (/3-gal), which catalyzes the 
breakdown of the sugar lactose and is essential in the human body [Jacobson 
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Fig. 1. Fluorescence intensity reading from one experiment (the substrate concentration 
is 100 micro-molar). Each fluorescence intensity spike is caused by the release of a reaction 
product. 

et al. (1994), Dorland (2003)]. In the experiment a single /3-gal molecule is 
immobilized (by linking to a bead bound on a glass coverslip) and immersed 
in buffer solution of the substrate molecules. This setup allows /3-gal's enzy- 
matic action to be continuously monitored under a fluorescence microscope. 
To detect the individual turnovers, that is, the enzyme's switching from 
the E state to the E° state, careful design and special treatment were carried 
out (such as the use of photogenic substrate resorun-/3-D-galactopyranoside) 
so that once the experimental system was placed under a laser beam the re- 
action product and only the reaction product was fluorescent. This setting 
ensures that as the /3-gal enzyme catalyzes substrate molecules one after 
another, a strong fluorescence signal is emitted and detected only when 
a product is released, that is, only when the reaction reaches the E° + P 
stage in (1.1). Recording the fluorescence intensity over time thus enables 
the experimental determination of individual turnovers. A sample fluores- 
cence intensity trajectory from this experiment is shown in Figure 1. High 
spikes in the trajectory are the results of intense photon burst at the E° + P 
state, while low readings correspond to the E or ES state. The time lag be- 
tween two adjacent high fluorescence spikes is the enzymatic turnover time, 
that is, the time to complete a catalytic circle. 

Examining the experimental data, including the distribution and auto- 
correlation of the turnover times as well as the fluorescence intensity auto- 
correlation, researchers were surprised that the experimental data showed 
a considerable departure from the Michaelis-Menten mechanism. Section 2 
describes the experimental findings in detail. Figure 2 illustrates the dis- 
crepancy between the experimental data and the Michaelis-Menten model 
in terms of the autocorrelations. The left two panels show the experimentally 
observed fluorescence intensity autocorrelation and turnover time autocor- 
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Fig. 2. Left column: experimentally observed fluorescence intensity and turnover time 
autocorrelations under different substrate concentrations [S] (20, 100 and 380 micromo- 
lar). Right column: the autocorrelations predicted by the classical Michaelis-Menten model. 
Under the Michaelis-Menten model, the turnover time autocorrelation should be zero and 
the intensity autocorrelations should decay exponentially and decay faster under larger- 
concentration. All contradict the experimental findings. 



relation under different substrate concentrations [S]. The right two panels 
show the corresponding autocorrelation patterns predicted by the Michaelis- 
Menten model. Comparing the bottom two panels, we note that under the 
classical Michaelis-Menten model the turnover time autocorrelation should 
be zero (hence the horizontal line at the bottom-right panel), which clearly 
contradicts the experimental result on the left. Prom the top two panels we 
note that under the Michaelis-Menten model the fluorescence intensity au- 
tocorrelation should decay exponentially and should decay faster with larger 
substrate concentration, but the experimental result shows the opposite: the 
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intensity autocorrelations decay slower with larger substrate concentration, 
and they do not decay exponentially. 

To explain the experimental puzzle, a new stochastic network model was 
introduced [Kou et al. (2005), Kou (2008b)], and it was shown that the 
stochastic network model well explained the experimental distribution of 
the turnover times. The autocorrelation of successive turnover times and 
the correlation of experimental fluorescence intensity, however, were not in- 
vestigated in the previous articles. 

This paper further explores the stochastic network model, concentrating 
on the correlation structure of the turnover times and that of the fluores- 
cence intensity. The rest of the paper is organized as follows. Section 2 
reviews the preceding work, including the experiment observation and the 
new stochastic network model. Section 3 analytically calculates the turnover 
time autocorrelation and the fluorescence intensity autocorrelation based on 
the stochastic network model. These analytical results give an explanation 
of the multi-exponentially decay pattern of the autocorrelation functions. 
Section 4 discusses how to fit the experiment data within the framework of 
the stochastic network model. The paper ends in Section 5 with a summary 
and some concluding remarks. 

2. Modeling enzymatic reaction. 

2.1. The classical model and its challenge. Under the classical Michaelis- 
Menten model (1.1), an enzyme molecule behaves as a three-state continuous- 
time Markov chain with the generating matrix (infinitesimal generator) 

/-h[S] h[S] \ 

Qmm= fe-i -(k-i+kz) k 2 • 
\ 6 -6 J 

We can readily draw two properties from this continuous-time Markov chain 
model. 



Proposition 2.1. The density function of the turnover time, the time 
that it takes the enzyme to complete one catalytic cycle (i.e., to go from 
state E to state E°), is 

= hk 2 [S} / -(g-p)t _ e _(g+p) t x 

2p 

where p = y /(k 1 [S] + k 2 + fc„i) 2 /4 - kik 2 [S] and q = (h[S] + k 2 + fe_i)/2. 
Proposition 2.2. The successive turnover times have no correlation. 



The first proposition implies that the density of turnover time is almost 
an exponential, since the term e~^~ p ^ easily dominates the term e~^ q+p ^ t 
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for most values of t; see Kou (2008b) for a proof. The second proposition is 
a consequence of the Markov property: each turnover time, which is a first 
passage time, is independently and identically distributed. 

The third property concerns the autocorrelation of the fluorescence inten- 
sity. As we have seen in Figure 1, the experimentally recorded fluorescence 
intensity consists of high spikes and low readings. The high peaks corre- 
spond to the release of the fluorescent product (when the enzyme is at the 
state E°), whereas the low readings come from the background noise. We 
can thus think of the fluorescence intensity reading as a record of an on-off 
system: E° being the on state, E and ES being the off states. 

Proposition 2.3. The autocorrelation function of the fluorescence in- 
tensity is proportional to exp(— t(k-\ + + 

The proof of the proposition will be given in Corollary 3.11. This proposi- 
tion says that under the Michaelis-Menten model the intensity autocorrela- 
tion decays exponentially and faster with larger substrate concentration [S] . 

The results from the single-molecule experiment on /3-gal [English et al. 
(2006)] contradict all three properties of the Michaelis-Menten model: 

(1) The empirical distribution of the turnover time does not exhibit ex- 
ponential decay; see Kou (2008b) for a detailed explanation. 

(2) The experimental turnover time autocorrelations are far from zero, as 
seen in Figure 2. 

(3) The experimental intensity autocorrelations decay neither exponen- 
tially nor faster under larger concentration. See Figure 2. 

2.2. A stochastic network model. We believe these contradictions are 
rooted in the molecule's dynamic conformational fluctuation. An enzyme 
molecule is not rigid: it experiences constant changes and fluctuations in its 
three-dimensional shape and configuration due to the entropic and atomic 
forces at the nano scale [Kou and Xie (2004), Kou (2008a)]. Although for 
a large ensemble of molecules, the (nanoscale) conformational fluctuation 
is buried in the macroscopic population average, for a single molecule the 
conformational fluctuation can be much more pronounced: different confor- 
mations could have different chemical properties, resulting in time- varying 
performance of the enzyme, which can be studied in the single-molecule ex- 
periment. The following stochastic network model [Kou et al. (2005)] was 
developed with this idea: 
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(2.1) S + E 2 ^ ES 2 -%P + E%, E°%E 2 , 

It It It 

S + E n <^ ES n -$P + E n , E n -^E n . 

In 

This is still a Markov chain model but with 3n states instead of three. The 
enzyme still exists as a free enzyme E, an enzyme-substrate complex ES or 
a returning enzyme E° , but it can take n different conformations indexed by 
subscripts in each stage. At each transition, the enzyme can either change 
its conformation within the same stage (such as Ei — > Ej or ESi — > ESj) 
or carry out one chemical step, that is, move between the stages (such as 
Ei — > ESi, ESi — > Ei or ESi — * Ef). Since only the product P is fluorescent 
in the experiment, in model (2.1) any state Ef is an on-state, and the others 
are off-states. Consequently, the turnover time is the traverse time between 
any two on- states E? and Ef. 

To fully specify the model, we need to stipulate the transition rates. For 
i^j, we use Oy, and %j to denote, respectively, the transition rates of 
Ei — > Ej, ESi —> ESj and Ef — >• E®. ku[S], k-u, k 2 i and Si are, respectively, 
the transition rates of Ei — » ESi i ESi ~~ ^ Ei , ESi —> Ef and Ef — > Ei. Define 
Qaa, Qbb and Qcc to be square matrices: 

QaA = [oiij]nxn, QBB = [Pij]nxn, Qc'C = [lij]nxn, 

where an = - J2j^i a iji Pa = ~ J2j^i@ij and la = ~ Yn&Hj- Tne y corre- 
spond to transitions among the Ei states, among the ESi states and among 
the Ef states, respectively. Define diagonal matrices 

Qab = diag{fcn[S], h 2 [S], . . . , h n [S}}, 

Qba = diag{fc_ii, &_i2, • • • , fc-in}, 

(2.2) 

Qbc = diag{k 2 i,k 22 , . . .,k 2n j, 
Qca = diag{<5i,<5 2 , • ■ .,8 n }. 

They correspond to transitions between the different stages. The generating 
matrix of model (2.1) is then 

(Qaa - Qab Qab 

Qba Qbb - (Qba + Qbc) Qbc 
Qca Q cc - Qca 

Under this new model, the distribution of the turnover time, the cor- 
relation of turnover times and the correlation of the fluorescence intensity 
can be analyzed and compared with experimental data. This paper studies 
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the autocorrelation of turnover time and the autocorrelation of fluorescence 
intensity. 

3. Autocorrelation of turnover time and of fluorescence intensity. 

3.1. Dynamic equilibrium and stationary distribution. In the chemistry 
literature, the term "equilibrium" often refers to the state in which all the 
macroscopic quantities of a system are time-independent. For the micro- 
scopic system studied in single- molecule experiments, macroscopic quan- 
tities, however, are meaningless, and microscopic parameters never cease 
to fluctuate. Nonetheless, for a micro-system, one can talk about dynamic 
equilibrium in the sense that the distribution of the state quantities become 
time-independent, that is, they reach the stationary distribution. The single- 
molecule enzyme experiment that we consider here falls into this category, 
since the enzymatic reactions happen quite fast. We cite the following lemma 
[Lemma 3.1 of Kou (2008b)], which gives the stationary distribution of the 
Markov chain (2.1): 

Lemma 3.1. Let X(t) be the process evolving according to (2.1). Sup- 
pose all the parameters ku,k_u,k2i,Si,aij, , and jij are positive. Then 
X(t) is ergodic. Let the row vectors it a = (tt(Ei),tt(E2), ■ ■ ■ ,ir(E n )), tvb = 
(n(ESi), . . . ,ir(ES n )), and itq = {k{E® ),..., tt(E®)) denote the stationary 
distribution of the entire network. Up to a normalizing constant, they are 
determined by 

tv a = -7t c QcaL, tt b = -ttcQcaM, 
vr C (Qcc - Qca ~ QcaMQ bc ) = 0, 

where the matrices 

L = [Qaa - Qab - Qab(Qbb - Qba - Qbc^Qba]" 1 , 
M = [Q BB - Q BC - (Qbb - Qba - Qbc)QabQaa] _1 . 

Under the stochastic network model (2.1), a turnover event can start from 
any state Ei and end in any E®. It follows that the overall distribution of 
all the turnover times is characterized by a mixture distribution with the 
weights given by the stationary probability of a turnover event's starting 
from Ei. The following lemma, based on Lemma 3.4 of Kou (2008b), provides 
the stationary probability. 

Lemma 3.2. Let w be a row vector, w= (w(Ei), w{E2), ■ ■ ■ , w{E n )), 
where w(Ei) denotes the stationary probability of a turnover event's starting 
from state Ei. Then up to a normalizing constant, w is the nonzero solution 
of 

(3.1) w(I + MQ BC -Q-^Q cc ) = 0. 
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3.2. Autocorrelation of turnover time. 



Expectation of turnover time. The enzyme turnover event occurs one 
after another. Each can start from any Ei and end in any E®. The next 
turnover may start from E^ (k ^ j) when the system exits the E° stage 
from E®. To calculate the correlation between turnover times, it is necessary 
to find out the probabilities of all these combinations and the expected 
turnover times. We introduce the following notation. 

Let T Ei and T E Si denote the first passage time of reaching the set {E®, 
E®,...,E®} from Ei and ESi, respectively. Let P E . E q and P E SiE^ be the 
probability that a turnover event, starting, respectively, from Ei and ESi, 
ends in E®. Let Pe°Ej denote the probability that, after the previous turnover 

ends in Ef, a new turnover event starts from Ej. Finally, let T E . E q and 
T ES . E q be the first passage time of reaching the state E® from Ei and ESi, 
respectively. 

For the values of E{T Ei ) and E(T E s t ), we c ^ e the following lemma [Corol- 
lary 3.3 of Kou (2008b)].' 

Lemma 3.3. Let the vectors fi A = (E(T El ),E(T E2 ), . . . ,E(T En )) T and 
[i B = (E^esx), ■ ■ • i E(TEs n )) T denote the mean first passage times. Then 
they are given by 

'/x A \ /-(L + M)l 
ji B ) V-(N + R)l. 

where the matrices N and R are given by 

N = [Qaa - (Qaa - Qab)Q b 1 a (Qbb - Qbc)]~\ 
R = [Qbb - Qba - Qbc - Qba(Qaa - Qab)~ Qab] - ■ 
For the probabilities P E . E q, P E g. E o and P E p E ., we have the following 

1 3 1 3 i 3 

lemma: 



(3.2) 



Lemma 3.4. Let Pac, Pbc and Pca be probability matrices Pac = 

[PEtE^nxn, P bc = [P es t E°\nx.n and P C A = [PE°E n ]nxn- Then they are given 

j j i ■> 

by 

(3.3) P AC = -MQ SC , P BC = -RQ SC , Poa = (I-Q^Qoo) -1 . 
For the expectation of T Ei E° an d T ES . E q, we have the following. 
Lemma 3.5. Let 

EAC =[P EiE qE(T E . E o)]nxn and E B C = [PESiE9 E { T ESiE°)]nxn 
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be two n x n matrices. Then they are given by 

E AC = (LM + MR)Q BC , E BC = (NM + RR)Q BC . 

We defer the proofs of Lemmas 3.4 and 3.5 to the Appendix. 

Correlation of the turnover times. Let T % denote the ith turnover time. 
The next theorem, based on Lemmas 3.1 to 3.5, obtains the autocorrelation 
of the successive turnover times. We defer its proof to the Appendix. 

Theorem 3.6. The covariance between the first turnover and the mth 
turnover (m > 1) is given by 

cov(T\T m ) = -w(L + M(I - Q^QA.4))[(P J 4cPcA) m " 1 - IwW, 

where P AC ~Pca = -MQ BC (I - Q^Qcc) -1 - 

The matrix Pac^ca is the product of two transition-probability matri- 
ces, so it is a stochastic matrix. Given that all the states in the stochastic 
network model communicate with each other, Pac^ca is also irreducible, 
and all its entries are positive. According to the Perron-Frobenius theorem 
[Horn and Johnson (1985)], such a matrix has eigenvalue one with simplicity 
one, and the absolute values of the other eigenvalues are strictly less than 
one. We therefore obtain the following corollary of Theorem 3.6. 

Corollary 3.7. Suppose that Pac^ca is diagonalizable: 
Vac~Pca = UAU 1 = lw + \w>l>T, 

1=2 

where the diagonal matrix A = diag(l, A2, • • • , A n ) consists of the eigenvalues 
of Pac^ca with < 1; the columns, 1, <p2, ■ ■ ■ , <p n , of matrix U are the 
corresponding right eigenvectors; and the rows, w, fa, . . . , ip n , of U" 1 are 
the corresponding left eigenvectors. Then we have 

n 

(3.4) cov(T\T m ) = J>,A™-\ 

i=2 

where Oi = -w(L + M(I - QabQaa))^! Ha- 

Although the matrix Pac^ca may have complex eigenvalues, these com- 
plex eigenvalues and corresponding eigenvectors always appear as conjugate 
pairs so that the imaginary parts in (3.4) cancel each other. As a result, we 
could treat all Aj and crj as if they were real numbers. 

Theorem 3.6, along with Corollary 3.7, provides an explanation of why 
the correlation of turnover times is not zero. At first sight, it seems to con- 
tradict the memoryless property of a Markov chain. What actually happens 
is that the state must be explicitly specified for the memoryless property 



CORRELATION ANALYSIS OF SINGLE-MOLECULE ENZYMATIC REACTION 11 



to hold (i.e., one needs to exactly specify whether an enzyme is at state E\ 
or E2), whereas in the single-molecule experiment we only know whether 
the system is in an "on" or "off" state (e.g., one only knows that the en- 
zyme is in one of the on-states E®,..., E®). When there are multiple states, 
this aggregation effect leads to incomplete information that prevents the in- 
dependence between successive turnovers; consequently, each turnover time 
carries some information about its reaction path, which is correlated with 
the reaction path of the next turnover, resulting in the correlation between 
successive turnover times. 

Corollary 3.7 also states that since |Aj| < 1, the autocorrelation is a mix- 
ture of exponential decays. Thus, depending on the relative scales of the 
eigenvalues, the actual decay might be single-exponential when one eigen- 
value dominates the others or multi-exponential when several major eigen- 
values jointly contribute to the decay. 

Fast enzyme reset. In most enzymatic reactions, including the one we 
study, the enzyme returns very quickly to restart a new cycle once the 
product is released [Segel (1975)]. Those enzymes are called fast-cycle-reset 
enzymes. To model this fact, we let Si (i = 1,2, ... ,n), the transition rate 
from E® to Ei, go to infinity. Then any enzyme in state E® will always re- 
turn to state Ei instantly, and the related transition probability matrix Pca, 
defined in (3.3), becomes the identity matrix. 

3.3. Autocorrelation of fluorescence intensity. 

Correlation of intensity as a function of time. In the single-enzyme ex- 
periments, the raw data are the time traces of fluorescence intensity, as 
shown in Figure 1. The time lag between two adjacent high fluorescence 
spikes gives the enzymatic turnover time. The fluorescence intensity read- 
ing, however, is subject to detection error: the error caused by the limited 
time resolution At of the detector. Starting from time 0, the detector will 

only record intensity data at multiples of At: 0, At, 2At, . . . , kAt, The 

intensity reading at time kAt is actually the total number of photons re- 
ceived during the period of ((k — 1) At, kAt). Thus, the detection errors of 
turnover time are roughly At. When the successive reactions occur slowly, 
the average turnover time is much longer than At, and the error is negli- 
gible. But when the reactions happen very frequently, the average turnover 
time becomes comparable to At, and this error cannot be ignored. In fact, 
when the substrate concentration is high enough, the enzyme will reach the 
"on" states so frequently that most of the intensity readings are very high, 
making it impossible to reliably determine the individual turnover times. 
Under this situation, it is necessary to directly study the behavior of the 
raw intensity reading. 

There are two main sources of the photons generated in the experiment: 
the weak but perpetual background noise and the strong but short-lived 
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burst. The number of photons received from two different sources can be 
modeled as two independent Poisson processes with different rates. We can 
use the following equation to represent I(t), the intensity recorded at time t: 

(3.5) I(t)=N t (T on (t))+N?(At), 

where Nt(s) and N®(s) represent the total number of photons received due 
to the burst and background noise, respectively, within a length s subinterval 
of (t — At,t); T on (t) is the total time that the enzyme system spends at the 
"on" states (any Ef) within the time interval (t — At,t). Nt{s) and N®(s) 
are independent Poisson processes with rates v and Vo, respectively. With 
this representation, we have the following theorem, whose proof is deferred 
to the Appendix. 

Theorem 3.8. The covariance of the fluorescence intensity is 

3n 

(3.6) cov(/(0),/(t)) oc ^ae w( *- At) , 

i=2 

where fa are the nonzero eigenvalues of the generating matrix Q defined 
in (2.3), and C, are constants only depending on Q. 

Since — Q is a semi-stable matrix [Horn and Johnson (1985)], it follows 
that the real parts of all ^ (k > 1) are negative. For a real matrix, the 
complex eigenvalues along with their eigenvectors always appear in conju- 
gate pairs; thus, the imaginary parts cancel each other in (3.6) and only the 
real parts are left. Therefore, we know according to Theorem 3.8 that the 
covariance of intensity will decay multi-exponentially. 

Fast enzyme reset and intensity autocorrelation. A fast-cycle-reset en- 
zyme jumps from state E® to Ei with little delay. A short burst of photons 
is released during the enzyme's short stay at Ef. For fast-cycle-reset en- 
zymes, the behavior of the whole system can be well approximated by an 
alternative system, where only states Ei and ESi (i £ 1, 2, . . . , n) exist: the 
transition rates among the E's, among the ES's, and from Ei to ESi are 
exactly the same as in the original system, but the transition rate from ESi 
to Ei is changed from k-n to k-u + k<2i, since once a transition of ESi — > Ef 
occurs, the enzyme quickly moves to Ei. We can thus think of lumping E® 
and Ei together to form the alternative system, which has generating matrix 

Qaa - Qab Qab 
Qba + Qbc Qbb - (Qba + Qbc) 

K is also a negative semi-stable matrix with 2n eigenvalues, one of which is 
zero. The following theorem details how well the eigenvalues of K approxi- 
mate those of Q. 



(3.7) K: 
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Theorem 3.9. Assume Qca = ^diagjgi, . . . ,q n }, where qi,.--,q n are 
fixed constants, while 5 is large. Let Ki (i = 2, 3, . . . , 2n) denote the nonzero 
eigenvalues o/K, then for each Ki, there exists an eigenvalue in of Q such 
that 

\ lH -K i \ = 0{5- 1 ' 2 ). 
The other n eigenvalues of Q satisfy 

\Hi + 5qi-2n\ = 0(1), i = 2n + l,...,3n. 

The proof is deferred to the Appendix. This theorem says that for fast- 
cycle-reset enzymes with large 5, the first 2n — 1 nonzero eigenvalues of Q 
can be approximated by the eigenvalues of K, while the other n eigenval- 
ues //2n+ii • • • )^3n of Q are of the same order of 5. Since all the eigenval- 
ues have negative real parts, according to (3.6), the terms associated with 
A*2n+l) • • • i ^3n decay much faster so their contribution can be ignored. Thus, 
we have the following results for the intensity autocorrelation. 

Corollary 3.10. For fast-cycle-reset enzymes (5— too), 

2n 

cov (1(0), I (t)) oc ^C;e K ^- A *\ 

i=2 

where K{ are the nonzero eigenvalues of matrix K defined in (3.7). 

Corollary 3.11. For the classic Michaelis-Menten model, where n = l, 

K= ( -h[S] h[s] \ 

\k2 + k~i —k2 — k-ij' 

The only nonzero eigenvalue is — (fe_i + hi + We thus have, for fast- 

cycle-reset enzymes, 

cov(I(0),I(t)) oce-tfc-i+fc+fcilsiX'-Ai). 

4. From theory to data. We have shown in the preceding sections that 
the autocorrelation of turnover times and the correlation of intensity follow 

cav(T 1 ,T m )x^ / X^- 1 <T i , cov(I(0),T(t)) cxY, eK * (t ~ At)c i- 

Before applying these equations to fit the experimental data, the following 
problems must be addressed. First, we know so far that the decay patterns 
must be multi-exponential, but we do not yet know how the eigenvalues 
are related to the rate constants (ku, k-u, kn, etc.) and the substrate 
concentration [S], which is the only adjustable parameter in the experiment. 
Second, we do not know the expressions of the coefficients (cjj and C{). Third, 



14 



C. DU AND S. C. KOU 



we do not know the number of distinct conformations n. We only know that 
it must be large: each enzyme consists of hundreds of vibrating atoms, and, 
as a whole, it expands and rotates in the 3-dimensional space within the 
constraint of chemical bonds. We next address these questions before fitting 
the experimental data. 

4.1. Eigenvalues as functions of rate constants and substrate concentra- 
tion. In the enzyme experiments, the transition rates are intrinsic proper- 
ties of the enzyme and the enzyme-substrate complex; they are not subject 
to experimental control. The only variable subject to experimental control is 
the concentration of the substrate molecules [S] . The higher the concentra- 
tion, the more likely that the enzyme molecule could bind with a substrate 
molecule to form a complex. This is why the association rate ku [S] (the rate 
of Ei — > ESi) is proportional to the concentration. The experiments were re- 
peated under different concentrations, resulting in different decay patterns 
of the autocorrelation functions as in Figure 2. A successful theory should be 
able to explain the relationship between concentration and autocorrelation 
decay pattern. 

The concentration only affect the transition rates between Ei and ESi, 
which are denoted by Qab in (2-2). Define Qab = diagj&ij, ku, ■ ■ • , k n i}, 



Four scenarios for simplication. To delineate the relationship between [S] 
and the autocorrelation decay pattern, we next simplify the generating ma- 
trices. Below are four scenarios that we will consider. Each of the scenarios 
guarantees the classical Michaelis-Menten equation, a hyperbolic relation- 
ship between the reaction rate and the substrate concentration, 

(4.1) v = — - — - = oc with some constant C, 

E(T) w/x A [S] + C 

which was observed in both the traditional and single-molecule enzyme ex- 
periments [see Kou et al. (2005), English et al. (2006) and Kou (2008b) for 
detailed discussion]. Each scenario has its own biochemical implications. 

Scenario 1. There are no or negligible transitions among the Ei states, 
that is, otij — >■ for i ^ j. 

Scenario 2. There are no or negligible transitions among the ESi states, 
that is, Pij — > for i / j . 

Scenarios 1 and 2 correspond to the so-called slow fluctuating enzymes 
(whose conformation fluctuates slowly over time). 

Scenario 3. The transitions among the Ei states are much faster than the 
others, that is, Qa4 = tQaa and the scale r 3> 1 is much larger than other 
transition rates. 

Scenario 4. The transitions among the ESi states are much faster than 
the others, that is, Qbb = tQbb and the scale r 3> 1 is much larger than 
other transition rates. 
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Scenarios 3 and 4 correspond to the so-called fast fluctuating enzymes 
(whose conformations fluctuate fast). 

Remark. In the previous work [Kou (2008b)], there are two other sce- 
narios, which can also give rise to the hyperbolic relationship (4.1): prim- 
itive enzymes, whose dissociate rate is much larger than their catalytic 
rate [Albery and Knowles (1976), Min et al. (2006), Min et al. (2005a)], 
and conformational-equilibrium enzymes, whose energy-barrier difference 
between dissociation and catalysis is invariant across conformations [Min 
et al. (2006)]. But our analysis based on those two scenarios does not lead 
to any meaningful conclusion, so we omit them here. 

The effect of concentration on turnover time autocorrelation. Based on 
the four scenarios, we have the following theorem for autocorrelation of 
turnover times. 

Theorem 4.1. For enzymes with fast cycle reset, the transition proba- 
bility matrix governing the autocorrelation of turnover times is 

PacPca = -MQ BC (I - Q^Qcc)" 1 = -MQ BC . 

Its eigenvalues \, under the four different scenarios, satisfy the following: 
Scenario 1. Aj do not depend on [S], the substrate concentration. Thus, 

the autocorrelation decay should be similar for all concentrations. 

Scenario 2. Aj depend on [S] hyperbolically. More precisely, if we use 

Xi([S]) (i = 1, 2, . . . , n) to emphasize the dependence of the eigenvalues on [S], 

we have 

Ai(fSl) = \ • 

1-{1-X-\1))/[S\ 

Thus, the autocorrelation decay should be slower under larger concentra- 
tion. 

Scenarios 3 or 4- The nonone eigenvalues are of order r^ 1 , so the auto- 
correlation should decay extremely fast for all concentrations. 

This theorem tells us that for fast fluctuation enzymes (scenarios 3 or 4), 
the turnover time correlation tends to be zero. Intuitively, this is because the 
fast fluctuation enzymes prefer conformation fluctuation rather than going 
through the binding-association-catalytic path that leads to the product, 
so in a single turnover event, the enzyme undergoes intensive conformation 
changes, which effectively blurs the information on the reaction path carried 
by the turnover time, resulting in zero correlation. Under scenario 1, the 
autocorrelation decay pattern does not vary when the concentration changes. 
This is because when the enzyme does not fluctuate, it goes from Ei to ESi 
directly, and the change of concentration consequently does not alter the 
distribution of the reaction path. Thus, the correlation between turnover 
times does not depend on the concentration. 
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The result from scenarios 1, 3 or 4 contradicts the experimental finding: 
correlation exists between the turnover time and is stronger under higher 
concentration (see Figure 2). Only scenario 2 fully agrees with the experi- 
ments, suggesting that the enzyme-substrate complex (ESi) does not fluctu- 
ate much. This is supported by recent single-molecule experimental findings 
[Lu, Xun and Xie (1998), Yang et al. (2003), Min et al. (2005b)] where 
slow conformational fluctuation in the enzyme-substrate complexes were 
observed. 

The effect of concentration on fluorescence intensity autocorrelation. We 
now consider the intensity autocorrelation under each of the four scenar- 
ios. We write Qaa = I« + J«> where I Q = diagjan, . . . ,a nn }, and Qbb = 
1/3 + J/3) where Ig = diag{/3n, . . . , f3 nn }. For scenarios 1 and 2, we assume 
that both the enzyme and the enzyme-substrate complex fluctuate slowly: 
ctij and fa (i ^ j) are negligible, but the sums an = ~Ylijjki a ij an d Pa = 
— ^2jjtil3ij are not. Furthermore, we assume that in scenario 1 the enzyme 
fluctuation is much slower than the enzyme-substrate complex fluctuation 
(so Qaa = and Qbb = lp in scenario 1), and in scenario 2 the enzyme- 
substrate complex fluctuation is much slower (so Qbb = and Qaa = la 
scenario 2). 

Theorem 4.2. For enzymes with fast cycle reset, the matrix governing 
the intensity autocorrelation is K. Its eigenvalues and the autocorrelation 
decay, under the four different scenarios, satisfy the following: 

Scenario 1 (Qaa = and Qbb = 1/3 )■ The autocorrelation decay is slower 
under lower concentration, and the dominating eigenvalues are given by 

«i = \ (- (I s ] fa - /Sji + fc-ii + fcai) + V([S]hi - Pa + k- u + k 2i ) 2 + 4/3ii[5]feii) . 

Scenario 2 (Qbb = and Qaa = IqJ- The autocorrelation decay is faster 
under lower concentration, and the dominating eigenvalues are given by 

K i = l(-([S]kii ~ an + k-u + k 2 i) 

+ \J ([S]ku - an + + k 2 i) 2 + ianik-u + k 2 i)). 

Scenario 3. The autocorrelation decay does not depend on the concentra- 
tion. 

Scenario 4- The autocorrelation decay is slower under lower concentration. 

The proof of the theorem is given in the Appendix. Our results of the 
dependence of turnover time autocorrelation and fluorescence intensity au- 
tocorrelation on the substrate concentration show that in order to have 
slower decay under higher substrate concentration (as seen in Figure 2), 
fluctuation of both the enzyme and the enzyme-substrate complex cannot 
be fast; furthermore, the fluctuation of the enzyme-substrate complex needs 
to be slower than the fluctuation of the enzyme. 
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In summary, each of the four scenarios yields a different autocorrelation 
pattern, but only the one under scenario 2 matches the experimental finding. 
Therefore, we will focus on scenario 2 from now on. 

4.2. Continuous limit. To simplify the coefficients at and Cj and to ad- 
dress the number of distinct conformations n, we adopt the idea in the 
previous work [Kou et al. (2005), Kou (2008b)] by utilizing a continuous 
limit. First, we let n — > oo and in this way model the transition rates as 
continuous variables with certain distributions. Consequently, we treat the 
eigenvalues also as continuous variables. Second, we assume that all the 
coefficients (<Tj and Cj) are proportional to the probability weight of the 
conjugate eigenvalues. This assumption is partly based on the fact that all 
the observed experimental correlations are positive. With these two assump- 
tions, the covariance can be represented by 

cov(T\T m )(x J \ m f{\) dX, cov(J(0), I(t)) oc J e K(t ~ At) g(n) oIk, 

where / and g are the corresponding distribution functions. 

A and k are functions of the transition rates. Since the transition rates 
are always positive, a natural choice is to model the transition rates as 
either constants or following Gamma distributions. In the previous work 
[Kou (2008b)] on the stochastic network model, the association rate k\ and 
dissociation rate k-i are modeled as constants while the catalytic rate k 2 
follows a Gamma distribution T(a,b). We adopt them in our fitting. 

We know from Section 4.1 that scenario 2 matches the experimental find- 
ing, so we take Qbb = and Qaa = la- Then the eigenvalue A (based on 
Theorem 4.1 and its proof in the Appendix) is given by 

(42) A= l + a*(fc_i + fc 2 )/([S]fcifc 2 )' 

and the eigenvalue k (based on Theorem 4.2) is 

k = U-([S]h + a* + fc_i + k 2 ) 

(4-3) 

+ y/QS\ki +a* + fc_i + k 2 ) 2 - 4a*(A;„ 1 + k 2 )), 

where a* stands for a generic —an (since we are taking the continuous 
version). For the distribution of a* (i.e., the distribution of —an), we note 
that, first, its support should be the positive real line, and, second, —an = 
a ij i s a sum °f m any random variables aij from a common distribution, 
so we expect that the distribution of a* should be infinitely divisible. These 
two considerations lead us to assume a Gamma distribution T(a a ,b a ) for a*. 

4.3. Data fitting. The data available to us include the intensity corre- 
lation under three concentrations: [S] = 380, 100 and 20 \xM (micro mo- 
lar), and turnover time autocorrelation under two concentrations [S] = 100 
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Fig. 3. Left: data fitting to the intensity and turnover time autocorrelations based 
on (4-2) and (4-3). Right: the corresponding distributions of the eigenvalues A and k. 



and 20 fiM. We calculated eigenvalues based on (4.2) and (4.3), where k\ 
and k-i are constants, a* and &2 follow distributions T(a a ,b a ) and T(a,b), 
respectively. The parameters of interests are ki, k-i, a, b, a a and b a . The 
best fits are found through minimizing the square distance between the 
theoretical and observed values. The parameters are estimated as follows: 
fci = 1.785 x 10 3 (^M) _1 s _1 , fc_x = 6.170 x 10 3 s"\ a = 13.49, b = 2.279 
s~ l , a a = 0.6489, and b a = 1.461 x 10 3 s (s stands for second). Figure 3 
shows the fitting of the autocorrelation functions and the distributions of 
the eigenvalues. 

Figure 3 shows that our model gives a good fit to the turnover time auto- 
correlation and an adequate fit to the fluorescence intensity autocorrelation, 
capturing the main trend in the intensity autocorrelation. The distributions 
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of the eigenvalues in the right panels clearly indicate that higher substrate 
concentration corresponds to larger eigenvalues, which are then responsible 
for the slower decay of the autocorrelations. 

Our model thus offers an adequate explanation of the observed decay 
patterns of the autocorrelation functions. The stochastic network model tells 
us why the decay must be multi-exponential. It further explains why the 
decay is slower under higher substrate concentration. Our consideration of 
the different scenarios also provides insight on the enzyme's conformational 
fluctuation: slow fluctuation, particularly of the enzyme-substrate complex, 
gives rise to the experimentally observed autocorrelation decay pattern. 

5. Discussion. In this article we explored the stochastic network model 
previously developed to account for the empirical puzzles arising from recent 
single-molecule enzyme experiments. We conducted a detailed study of the 
autocorrelation function of the turnover time and of the fluorescence inten- 
sity and investigate the effect of substrate concentration on the correlations. 

Our analytical results show that (a) the stochastic network model gives 
multi-exponential autocorrelation decay of both the turnover times and the 
fluorescence intensity, agreeing with the experimental observation; (b) un- 
der suitable conditions, the autocorrelation decays more slowly with higher 
concentration, also agreeing with the experimental result; (c) the slower au- 
tocorrelation decay under higher concentration implies that the fluctuation 
of the enzyme-substrate complex should be slow, corroborating the conclu- 
sion from other single-molecule experiments [Lu, Xun and Xie (1998), Yang 
et al. (2003), Min et al. (2005b)]. In addition to providing a theoretical un- 
derpinning of the experimental observations, the numerical result from the 
model fits well with the experimental autocorrelation as seen in Section 4. 

Some problems remain open for future investigation: 

(1) When we discussed the dependence of intensity autocorrelation on sub- 
strate concentration in Section 4.1, we approximated the fluctuation tran- 
sition matrix with its diagonal entries. This simple approximation provides 
useful insight into the decay pattern under different concentration. A better 
approximation that goes beyond the diagonal entries is desirable. It might 
lead to a better fitting to the experimental data. 

(2) We used Gamma distribution to model the transition rates. This is 
purely statistical. Can it be derived from a physical angle? If so, the connec- 
tion not only will lead to better estimation, but also provides new insight 
into the underlying mechanism of the enzyme's conformation fluctuation. 

(3) We used the continuous limit n — > oo to do the data fitting so that 
the number of parameters reduces from more than 3n to a manageable six. 
Obtaining the standard error for the estimates is open for future investiga- 
tion. The main difficulties are the lack of tractable tools to approximate the 
standard error of the autocorrelation estimates and the challenge to carry 
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out a Monte Carlo estimate (n needs to be quite large for an ad hoc Monte 
Carlo simulation, but such an n will bring back a large number of unspecified 
parameters). 

Single-molecule biophysics, like many newly emerging fields, is interdisci- 
plinary. It lies at the intersection of biology, chemistry and physics. Owing 
to the stochastic nature of the nano world, single-molecule biophysics also 
presents statisticians with new problems and new challenges. The stochas- 
tic model for single-enzyme reaction represents only one such case among 
many interesting opportunities. We hope this article will generate further 
interest in solving biophysical problems with modern statistical methods; 
and we believe that the knowledge and tools gained in this process will in 
turn advance the development of statistics and probability. 

APPENDIX: PROOFS 

Proof of Lemma 3.4. Using the first-step analysis, we have 

?ac\ ( 



G 

where 



PbcJ VQ 



IBC 



Qaa-Qab Qab \_fL M x " 



Qba Qbb-(Qba + Qbc) J V N R . 

Only the diagonal elements of G are negative, and its row sums are either 
or negative. Thus, — G is a stable matrix [Horn and Johnson (1985)], which 
always has an inverse. Thus, we have 

*Ac\_ G -if \_f-MQ BC 



Pbc / \Qbc J V -R-Qbc 

For P E o E ., similarly, we have 

Fca = -{Qcc - Qca)- 1 Qca = (I - QcaQcc)" 1 - □ 

Proof of Lemma 3.5. For E(T E . E o), when the first-step analysis is 

* 3 

applied, the first-step probability should be conditioned on the exit state E®, 
that is, P(E{ returns to E\. first | exit at E^) = P{Ei returns to E k first)Pg; E o / 

J k j 

P E E o. Thus, we have the following equation: 

1 j 

(^ESE® E E® 

1 + hi[S]-^E(T ESiE o) + J2a ik -^E(T EkE o) 



3 3 



hi [S] + ^2 a i 

k^i 
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Similar expression can be derived for E(T ES . E o). Together we have 

(E AC \ = _ r -i f?Ac\ = /LMQ BC + MRQ BC \ 
\B BC J~ \VbcJ V nm Qbc + RRQbc; ' D 

Proof of Theorem 3.6. cov(r 1 ,T m ) = E(T 1 T m )-E(T 1 )E{T m ). The 
first term E(T l T m ) can be expressed as 

E(T 1 T m )=J2 w(E t )P EtE oE(T EiE o)P E o Ek P^ ] E(T El ), 
i,j,k,l 

that is, the system starts the first turnover event from Ei, ends it in E®, then 
starts the second from E^, repeats this procedure for m — 2 times, and finally 

starts the last turnover from E\. Note that [P^ E ]nxn = (PacPca)" 1 " 2 - 
Thus, using the matrices defined in Lemmas 3.1 to 3.5, we have 

cov(T\T m ) = wE ic P CA (P^P C i) m -Vi - (w/x A ) 2 . 

Applying the results of Lemmas 3.4 and 3.5 and the facts that R = 
(I - Q' AB Q AA )M and (w/x A ) 2 = -w(L + M)lw/x A = -w(L + M(I - 
QabQaa)) x 1 W A*A) we can finally arrange the covariance as 

cov(T\ T m ) = -w(L + M(I - QA B QAA))[(PAC'PcA) m ~ 1 ~ lw]/z A . □ 

Proof of Corollary 3.7. We only need to prove that 1 and w are, 
respectively, the right and left eigenvectors of Pac^ca associated with 
the eigenvalue 1. The first is a direct consequence of the fact that P^cPca 
is a stochastic matrix. The second can be verified by observing that 
-wMQ BC (I - Q^Qcc)" 1 = w through (3.1). □ 

Proof of Theorem 3.8. In (3.5), the second term N^(At) represents 
the independent background noise during period (t — At,t). Thus, 

cov(J(0),I(i)) = E[N t (T on (t))N (T on (0))} - E[N t (T on (t))]E[N {T on (0))] 

= u 2 [E(T on (t)T on (0)) - E(T on (t))E(T on (0))]. 

Let S = {Ei, . ..,E n , ESi, ES n , E®, E®} be the set of all possible 
states. Let Xt be the process evolving according to (2.1). Let 7Tj be the 
equilibrium probability of state i and Pij(s) be the transition probability 
from state % to state j after time s. We have 

E(T on (t)T on (0)) - E(T on (t))E(T on (0)) 

= ^iPij{^t)Pjk{t - At)P kl (At)E(T on (0)\X_ At = i,X = j) 

i,j,k,l£S 

xE(T on (t)\Xt~At = k,X t = l) 
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- Yl K t P tJ (At)7r k P kl (At)E(T on (0)\X_ At = i,X =j) 

i,j,k,l(ES 

xE(T on (t)\X t _ At = k,X t = l) 
= Y KiPv(&t)P kl {M)E(T on {Q)\X_ M = i,X =j) 

x E(T on {t)\X t _ At = k,X t = l){P jk (t - At) - 7r k }. 

The probability transition matrix [Py(i)]3nx3n is the matrix exponential 
of the generating matrix (2.1): [Pij(t)]s nX 3n = exp(Qi). Zero is an eigenvalue 
of Q with right eigenvector 1 and left eigenvector it, the stationary distri- 
bution. Assume Q is diagonalizable. Let i = 2, 3, . . . , 3n, denote the other 
eigenvalues, and £ 4 and rjf be the corresponding right and left eigenvectors. 
We have 

3n 

exp(Qt) = l7r + JV^rjf. 

i=2 

Therefore, we can rewrite 

3n 

cov(/(0),/(t))oc^C t e^(*- A '). 

To prove Theorem 3.9, we need the following two useful lemmas on the 
eigenvalues of a matrix. 

Lemma A.l [Theorems 6.1.1 and 6.4.1 of Horn and Johnson (1985)]. Let 
A = [aij] £ M n , where M n is the set of all complex matrices. Let a 6 [0, 1] 
be given and define and C[ as the deleted row and column sums of A, 
respectively, 

R i = ^2\ a ij\i C 'i = ^2\ a ji\- 

Then, (1) all the eigenvalues of A are located in the union of n discs 

n 

(A.l) \J{zeC:\z- ail \<R^C'^ a }. 
t=l 

(2) Furthermore, if a union of k of these n discs forms a connected region 
that is disjoint from all the remaining n — k discs, then there are precisely k 
eigenvalues of A in this region. 

Lemma A. 2 [pages 63-67 of Wilkinson (1988)]. Let A and B be matrices 
with elements satisfying \aij\ < 1, \bij\ < 1. If X\ is a simple eigenvalue (i.e., 
an eigenvalue with multiplicity 1) of A, then for matrix A + eB, where e is 
sufficiently small, there will be a eigenvalue \\{e) of A + eB such that 

|Ai(e)-Ai|=0(e). 
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Furthermore, if we know that one eigenvector of A associated with X\ is xi, 
then there is an eigenvector xi(e) of A + eB associated with Xi(s) such that 

|xi(e) -xi| = 0(e). 



Note that since dividing a matrix by a constant only changes the eigen- 
values with the same proportion, the condition that the entries of A and B 
are bounded by 1 can be relaxed to that the entries of A and B are bounded 
by a finite positive number. 

Proof of Theorem 3.9. According to Lemma A.l, all the eigenvalues 
of Q must lie in the union of discs centered at Qa with radii defined by (A.l). 
If we take a = 1/2 in (A.l), then the first n discs corresponding to the 
diagonal entries of Qaa — Qab have centers O(l) and radii 0(5 1 ^ 2 ); the 
second n discs corresponding to the diagonal entries of Qbb — Qbc — Qba 
have centers 0(1) and radii O(l); the third n discs corresponding to the 
diagonal entries of Qcc — Qca have centers 0(5) and radii 0(5 1 / 2 ). Thus, 
for 5 large enough, the union of the first 2n discs does not overlap with 
the union of the last n discs, so we know from Lemma A.l that Q has 2n 
eigenvalues with order 0(d 1 ^ 2 ) in the union of the first 2n discs and n other 
eigenvalues with order 0(5) in the union of the last n discs. 

For the n eigenvalues with order 0(5), consider the following two matrices: 





1 





Qca 








1 





Qca 



Qaa - Qab 
Qba 




Q 



BB 



Qab 
Qba 




Q 



BC 





Qbc 
Qcc 



We have ±Q = Y+±Z. Zer o is an eigenvalue of Y with multiplicity 2n, and 
the other n eigenvalues of Y are —qx,—q2,...,—q n . For large 5, according 
to Lemma A. 2, there exists n eigenvalues of that satisfy 



[J>2n- 



-i/S = -ft + 0(6' 



1.2,.. 



, n. 



that is, 



\fii + 5qi- 2n \ =50(6~ 1 ) = 0(1), i = 2n + l,2n + 2,...,3n. 

Now for the 2n — 1 nonzero eigenvalues of Q with order 0(5 1 ^ 2 ), they are 
the solutions of 

|Q - /xil 3n | = 0, i = 2,...,2n. 

For large 5, the matrix Qcc — Qca — Hi^-n is invertible, since it is strictly 
diagonal dominated. We can decompose the determinant as 

|Q - Hil^ n \ = \U(/j,i)\\Qcc - Qca - M^nl = 0, 
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where 
U(Mi) : 



QaA - Qab - fJ-dn 

Qba - Qbc(Qcc - Qca ~ MiIn) _1 Q 



Qab 
Qbb — Qba 

— QBC — Hdr 



Therefore, /ij 



2, . . . , 2n, is also the eigenvalue of the matrix 



Qaa - Qab 
Qba - Qbc(Qcc - Qca - fair, 



1 Qca Q 



BB 



Qab 
Qba 



Q 



BC 



K + S 



with 





-Qbc -Qbc{Qcc -Qca- vJ-n)~ l QcA 

We note that I n + Qbc(Qcc ~ Qca - MiIn) _1 QcA = (W - I n ) _1 W, where 
- Hiln). Since Qca is of the order 0(5) and fii is of the 
order 0(5 1 ^ 2 ), the entries of W are of the order 0(5~ 1 ^ 2 ), so are the entries 
of S. Applying Lemma A. 2 to K + S tells us that for each /ij there must be 
an eigenvalue Ki of K, which has the property that 

k 1 + 0{5- 1 ' 2 ), 



Mi 



2,...,2n. 



□ 



Proof of Theorem 4.1. We know from Lemma 3.1 that 

M = [Q BB - Q BC - (Qbb - Qba - Qbc^QabQaa]' 1 • 

Scenario 1. When Qaa = 0, M = (Qbb — Qbc) _1 i so the eigenvalues and 
eigenvectors of -MQ BC have nothing to do with [Sj. 

Scenario 2. When Qbb = 0, (-MQ^)' 1 = I n - ^Q^Qba + 

QbcOQab x Qaa- Thus, if -MQ BC has eigenvalue Aj(l) when [S] = 1, 
then for general [S], -MQ BC has eigenvalue 

xm) = i-(i-K\i))/[sy 

Scenario 3. We write Qaa = tQaa, where r is large. Then — MQbc is 
(-MQbc)- 1 = I - Q^Qbb + QbUQbb - Qba - Qbc)Qa B Qaa • r 



( I ~ QbcQbb) + Q B c(Qbb - Qba - Qbc)Q a \Qaa 



Suppose the eigenvalues of Q bc (Qbb - Qba - Qbc)Q ab Qaa are 0, \%, . ■ . , 
A*. Then according to Lemma A. 2, the eigenvalues of ^(I n — Q^Qbb) + 
(Qbb - Qba - Qbc)QaesQaa are 

0(r- 1 ),X*2 + 0(r- 1 ),...,X*n + 0(T- 1 ). 
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Thus, the eigenvalues of -MQ BC are 

1 

1, 



r\*+o(iy ■ va*+o(i)' 

namely, all the nonone eigenvalues of -MQ BC are of order r . 

Scenario 4. Using an identical method as in scenario 3, we can show that 
all the nonone eigenvalues of —NLQ BC are of order r . □ 

Proof of Theorem 4.2. The matrix K can be written as 

K _ f la - Qab Qab \ , / J a 

\Qba + Qbc 1/3 - Qba - Qbc J \ 



T + 



J a 


We thus know from Lemma A. 2 that the eigenvalues of K can be approxi- 
mated by the eigenvalues of T. If \I a — Qab — ^I n \ is invertible, then 

|T - Kl n \ = \I a - Qab - «In| 

x 1 1/3 - Qba - Qbc 

+ (Qba + Qbc)(-I« + Qab + kI^^Qab - 

we know that any eigenvalue n of T must make the second determinant on 
the right-hand side zero. This determinant only involves diagonal matrices, 
so we have 

[S]ku(k 2 i + k-u) 



(A.2) k - Pa + k 2i + k^u 



an + \S\ki 



If |I Q — Qab — ftlnl is not invertible, then there is at least one j so that k = 
otjj — k\j [S] . But it can be verified that in order to make k an eigenvalue of T, 
there must exit another i ^ j such that (A.2) holds for this k. Therefore, 
any eigenvalue must be a root of (A.2). 

Equation (A.2) has two negative roots for each i, but we only need to 
consider the root closer to 0, since it dominates the decay. 

Scenario 1. an = 0. The root is 

m = \ ( - ( [S] k u - Pa + fc_ u + k 2i ) + V([S]h - Pa + k_u + k 2i ) 2 + 4/?ii[5]fci) , 

which is monotone decreasing in [S]. 
Scenario 2. (3n = Q. The root is 



+ V ([S]ki - an + k-u + k 2 i) 2 + 4au(k-u + k 2i )), 
which is monotone increasing in [S]. 
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Scenario 3. Qaa = tQaa, where r is large. Following the same method 
as we used in the proof of Theorem 3.9, we can show that n eigenvalues of K 
are of the order 0(r) and they will not contribute much to the correlation. 
The other eigenvalues governing the decay pattern can be approximated 
by the eigenvalues of Qbb — Qba — Qbc, which do not depend on the 
concentration [S\. 

Scenario 4. Qbb = tQbb, where r is large. Using the same method as 
in the proof of Theorem 3.9, we can show that the dominating eigenvalues 
of K can be approximately by the eigenvalues of Qaa — Qab- Since we know 
that Qaa ~ !«> the eigenvalues of Qaa — Qab is approximately an — [S]ku, 
which is monotone decreasing in [Sj. □ 
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